Skip to content

Update Friendly Evaluator#46076

Open
w-javed wants to merge 10 commits intofeature/azure-ai-projects/2.0.2from
waqasjaved02/friendly-evaluator-properties-output
Open

Update Friendly Evaluator#46076
w-javed wants to merge 10 commits intofeature/azure-ai-projects/2.0.2from
waqasjaved02/friendly-evaluator-properties-output

Conversation

@w-javed
Copy link
Copy Markdown
Contributor

@w-javed w-javed commented Apr 2, 2026

No description provided.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

API Change Check

APIView identified API level changes in this PR and created the following API reviews

azure-ai-projects

w-javed and others added 4 commits April 2, 2026 18:37
Update the FriendlyEvaluator sample to return the new standard output
format with score, label, reason, threshold, and passed at the top level.
Extra evaluator output fields (explanation, tone, confidence) are nested
under a properties dict.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Use 'from openai import OpenAI' instead of AzureOpenAI
- Accept api_key and model params instead of model_config dict
- Use client.responses.create() instead of chat.completions.create()
- Update util.py: split build_evaluation_messages into
  build_evaluation_instructions() and build_evaluation_input()
- Update sample init_parameters schema accordingly

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address aprilk-ms review: annotate which fields in the evaluation
result dict are required vs optional for the evaluation service.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Align sample_eval_upload_friendly_evaluator.py with the updated
FriendlyEvaluator that takes api_key and model instead of
deployment_name/model_config.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@w-javed w-javed force-pushed the waqasjaved02/friendly-evaluator-properties-output branch from 5c23650 to 805a66c Compare April 3, 2026 01:38
w-javed and others added 3 commits April 2, 2026 23:31
Merge sample_custom_evaluator_friendly_evaluator.py into
sample_eval_upload_friendly_evaluator.py so the sample first runs
FriendlyEvaluator locally, then uploads, creates eval, and runs it.
Fix model_name parameter to match evaluator __init__ signature.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dargilco
Copy link
Copy Markdown
Member

dargilco commented Apr 3, 2026

Reminder to run 'black' tool. Thanks!
black --config ../../../eng/black-pyproject.toml .

@dargilco
Copy link
Copy Markdown
Member

dargilco commented Apr 3, 2026

Regarding the MyPy error, note that Azure SDK has recently updated their tools. They no longer use "tox". This is the new command to run MyPy: azpysdk mypy . (see https://github.com/Azure/azure-sdk-for-python/blob/main/doc/dev/tests.md#running-checks-locally)

w-javed and others added 2 commits April 3, 2026 07:11
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@@ -8,7 +8,7 @@ def __init__(self, *, config: str, threshold, **kwargs):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

threshold not needed? what is config?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, I can remove config from this simple evaluator. And just pass threshold of 50 characters. If response length is higher in length, do pass or fail. That would be simple sample.

"reason": result.get("reason", "No reason provided"),
"explanation": result.get("explanation", "No explanation provided"),
"threshold": threshold,
"passed": passed,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more - I prefer to just mention passed can be calculated in the evaluator logic as a comment, but we don't actually implement it (or maybe comment that out). I hope this is an unusual case, and user will setup threshold/default/direction in the evaluator metadata and let us do the calculation.

folder structure (common_util/) using `evaluators.upload()`.
2. Create an evaluation (eval) that references the uploaded evaluator.
3. Run the evaluation with inline data and poll for results.
1. Run the FriendlyEvaluator standalone to verify it works locally.
Copy link
Copy Markdown
Member

@aprilk-ms aprilk-ms Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file name is a bit weird. Can we replace friendly with what we are trying to demonstrate? We have 2 samples, maybe basic and advanced if it hard to be more specific?

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants